Skip to content

chore: add support for quantized versions of CV models CLIP, Style Transfer, EfficientNetV2, SSDLite#940

Merged
msluszniak merged 11 commits intomainfrom
@bh/quantize-cv-models
Mar 11, 2026
Merged

chore: add support for quantized versions of CV models CLIP, Style Transfer, EfficientNetV2, SSDLite#940
msluszniak merged 11 commits intomainfrom
@bh/quantize-cv-models

Conversation

@barhanc
Copy link
Copy Markdown
Contributor

@barhanc barhanc commented Mar 6, 2026

Description

Adds support for quantized versions of CV models CLIP, Style Transfer, EfficientNetV2, SSDLite and updates paths to non-quantized models exported with ExecuTorch v1.1.0.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

  1. Run the Computer Vision example app:
    • Object detection with model set to:
      • SSDLITE_320_MOBILENET_V3_LARGE
    • Classification with model set to:
      • EFFICIENTNET_V2_S,
      • EFFICIENTNET_V2_S_QUANTIZED
    • Style transfer with model set to:
      • STYLE_TRANSFER_CANDY,
      • STYLE_TRANSFER_MOSAIC,
      • STYLE_TRANSFER_UDNIE,
      • STYLE_TRANSFER_RAIN_PRINCESS,
      • STYLE_TRANSFER_CANDY_QUANTIZED,
      • STYLE_TRANSFER_MOSAIC_QUANTIZED,
      • STYLE_TRANSFER_UDNIE_QUANTIZED,
      • STYLE_TRANSFER_RAIN_PRINCESS_QUANTIZED,
  2. Run the Text Embeddings example app:
    • CLIP embeddings with image model set to:
      • CLIP_VIT_BASE_PATCH32_IMAGE,
      • CLIP_VIT_BASE_PATCH32_IMAGE_QUANTIZED
  3. Check HF pages for updated models:

Screenshots

Related issues

Closes #719

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@barhanc barhanc self-assigned this Mar 6, 2026
@barhanc barhanc added chore PRs that are chores model Issues related to exporting, improving, fixing ML models labels Mar 6, 2026
@NorbertKlockiewicz
Copy link
Copy Markdown
Contributor

I will run the new models later today to see if they work. I think you should also benchmark them and add the results to our docs.

You can ask @IgorSwat for the tips about benchmarking ;D

@IgorSwat
Copy link
Copy Markdown
Contributor

IgorSwat commented Mar 6, 2026

@barhanc Did you profile the added XNNPACK models following these instructions?

I guess they should be fine considering that those are only a quantized versions of already profiled models, but it's always nice to check if everything is alright with export.

@barhanc
Copy link
Copy Markdown
Contributor Author

barhanc commented Mar 8, 2026

I've added the profiling results to the corresponding READMEs in the internal exports gitlab. They all look fine to me (>80% delegated ops), but you can also take a look to make sure everything is correct.

Copy link
Copy Markdown
Contributor

@NorbertKlockiewicz NorbertKlockiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change examples in demo apps to use quantized models by default.

Comment thread packages/react-native-executorch/src/constants/modelUrls.ts
@msluszniak
Copy link
Copy Markdown
Member

msluszniak commented Mar 10, 2026

Please add in benchmark section how memory usage was measured

Comment on lines +377 to +382
? `${URL_PREFIX}-efficientnet-v2-s/${NEXT_VERSION_TAG}/coreml/efficientnet_v2_s_coreml_fp32.pte`
: `${URL_PREFIX}-efficientnet-v2-s/${NEXT_VERSION_TAG}/xnnpack/efficientnet_v2_s_xnnpack_fp32.pte`;
const EFFICIENTNET_V2_S_QUANTIZED_MODEL =
Platform.OS === `ios`
? `${URL_PREFIX}-efficientnet-v2-s/${NEXT_VERSION_TAG}/coreml/efficientnet_v2_s_coreml_fp16.pte`
: `${URL_PREFIX}-efficientnet-v2-s/${NEXT_VERSION_TAG}/xnnpack/efficientnet_v2_s_xnnpack_int8.pte`;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if we should silently run the coreml, or let the user handle it? Some users might prefer consistency across platforms, rather than performance. cc @NorbertKlockiewicz

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should discuss this and plan accordingly. The simplest way would be to have an constant for both xnnpack and coreml model, but with quantized models that's already 4 constants. Maybe we can figure out a better way so the users will be able to easily switch between models delegated to different backends. Answering your question, we should definitely let users select it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can add constants like EFFICIENTNET_V2_S_XNNPACK_FP32 that signal unambiguously both the backend and the precision for "power users" who want control; and additionally have simple constants like just EFFICIENTNET_V2_S which will just be the fastest variant of the model on the given platform.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it as it is now and we we will do it properly in other PR.

@barhanc barhanc force-pushed the @bh/quantize-cv-models branch from 7f903fd to e2f9f97 Compare March 11, 2026 01:40
@barhanc barhanc force-pushed the @bh/quantize-cv-models branch from e2f9f97 to 6556b74 Compare March 11, 2026 01:41
@NorbertKlockiewicz NorbertKlockiewicz self-requested a review March 11, 2026 08:13
@msluszniak
Copy link
Copy Markdown
Member

msluszniak commented Mar 11, 2026

Just a question, why for react-native-executorch-ssdlite320-mobilenet-v3-large for xnnpack there is only f32 version? And is there any reason why there is no CoreML for react-native-executorch-clip-vit-base-patch32. If there were some problems with exports, I think we should create issues on ExecuTorch repo for that.

@barhanc
Copy link
Copy Markdown
Contributor Author

barhanc commented Mar 11, 2026

@msluszniak I didn't export the CLIP Vision model to CoreML backend because the xnnpack variant is already extremely fast on iOS (<20ms forward() on iPhone 17 Pro) and I don't think we would see any benefit from CoreML in that case, but I can add it of course. I had some problems with quantizing SSDLite and since xnnpack fp32 is already fast (<20ms forward() on Pixel 10) I didn't think it was worth it.

@msluszniak
Copy link
Copy Markdown
Member

The question is if they are super-fast also on low-tier devices. I guess that the fact that models are super-fast on top devices, doesn't mean that we can speed them up for slower devices ;) but of course you are in the better position right now to decide if this is worth trying.

Copy link
Copy Markdown
Member

@msluszniak msluszniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I tested all the models, and everything works correctly 🚀

@msluszniak msluszniak merged commit 52756c7 into main Mar 11, 2026
5 checks passed
@msluszniak msluszniak deleted the @bh/quantize-cv-models branch March 11, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore PRs that are chores model Issues related to exporting, improving, fixing ML models

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Quantize CV models

5 participants